Biased sampling driven by bacterial population structure confounds machine learning prediction of antimicrobial resistance
Machine-learning models are increasingly used to predict antimicrobial resistance from bacterial genome data, but their performance is strongly undermined by hidden biases in how data are collected. Using more than 24,000 genomes from five different pathogens, this study shows that bacterial population structure and over-representation of human disease isolates can falsely link resistance to lineage rather than biology. As a result, many ML models perform poorly, even when trained on very large datasets. Model accuracy varies by species and antibiotic, highlighting that one-size-fits-all approaches do not work. Overall, the findings demonstrate that current ML methods for AMR prediction often fail under realistic conditions and must explicitly account for population structure and sampling bias, supported by more diverse and representative genomic datasets.
AMR NEWS
Your Biweekly Source for Global AMR Insights!
Stay informed with the essential newsletter that brings together all the latest One Health news on antimicrobial resistance. Delivered straight to your inbox every two weeks, AMR NEWS provides a curated selection of international insights, key publications, and the latest updates in the fight against AMR.
Don’t miss out on staying ahead in the global AMR movement—subscribe now!



